Credit Assignment Method for Learning E ective Stochastic Policies in Uncertain Domains
نویسندگان
چکیده
In this paper, we introduce FirstVisit Pro tSharing (FVPS) as a credit assignment procedure, an important issue in classi er systems and reinforcement learning frameworks. FVPS reinforces e ective rules to make an agent acquire stochastic policies that cause it to behave very robustly within uncertain domains, without pre-de ned knowledge or subgoals. We use an internal episodic memory, not only to identify perceptual aliasing states but also to discard looping behavior and to acquire e ective stochastic policies to escape perceptual deceptive states. We demonstrate the e ectiveness of our method in some typical classes of Partially Observable Markov Decision Processes, comparing with Sarsa( ) using a replacing eligibility trace. We claim that this approach results in an e ective stochastic or deterministic policy which is appropriate for the environment.
منابع مشابه
An Analysis of Direct Reinforcement Learning in Non-Markovian Domains
It is well known that for Markov decision processes, the policies stable under policy iteration and the standard reinforcement learning methods are exactly the optimal policies. In this paper, we investigate the conditions for policy stability in the more general situation when the Markov property cannot be assumed. We show that for a general class of non-Markov decision processes, if actual re...
متن کاملUncertain Resource Availabilities: Proactive and Reactive Procedures for Preemptive Resource Constrained project Scheduling Problem
Project scheduling is the part of project management that deals with determining when intime to start (and finish) which activities and with the allocation of scarce resources to theproject activities. In practice, virtually all project managers are confronted with resourcescarceness. In such cases, the Resource-Constrained Project Scheduling Problem (RCPSP)arises. This optimization problem has...
متن کاملAnalysing the Effects of Reward Shaping in Multi-Objective Stochastic Games
The majority of Multi-Agent Reinforcement Learning (MARL) implementations aim to optimise systems with respect to a single objective, despite the fact that many real world problems are inherently multi-objective in nature. Research into multi-objective MARL is still in its infancy, and few studies to date have dealt with the issue of credit assignment. Reward shaping has been proposed as a mean...
متن کاملE ective Learning Approach for Planning and Scheduling in Multi-Agent Domain
The point we want to make in this paper is that Pro t-sharing; a reinforcement learning approach is very appropriate to realize the adaptive behaviors in a multi-agent environment. We discuss the e ectiveness of Pro t-sharing theoretically and empirically within a Pursuit Game where there exist multiple preys and multiple hunters. In our context of this problem, hunters need to coordinate adapt...
متن کاملFiltered Reinforcement Learning
Reinforcement learning (RL) algorithms attempt to assign the credit for rewards to the actions that contributed to the reward. Thus far, credit assignment has been done in one of two ways: uniformly, or using a discounting model that assigns exponentially more credit to recent actions. This paper demonstrates an alternative approach to temporal credit assignment, taking advantage of exact or ap...
متن کامل